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Surrogate data testing is a method frequently applied to 
evaluate the results of nonlinear time series analysis. Since 
the null hypothesis tested against is a linear, gaussian, sta- 
tionary stochastic process a positive outcome may not only 
result from an underlying nonlinear or even chaotic system, 
but also from e.g. a non-stationary linear one. We investigate 
the power of the test against non-stationarity. 



I. INTRODUCTION 

The field of non-linear dynamics introduced the fas- 
cinating idea that an apparently random behavior of 
a time series might have been generated by a low di- 
mensional deterministic system [EJ. Based on the no- 
tions of chaos theory, different algorithms have been in- 
vented to infer if an observed time series is a realization 
of a chaotic system, e.g. the estimation of the largest 
Lyapunov-exponent 0, the correlation dimension || and 
nonlinear prediction H . There is hope to gain deeper in- 
sights in complex systems like those from biology and 
physiology by applying these methods. 

However, the application of these methods to a finite, 
often noisy set of measured data is not straightforward, 
see e.g. |5|-|9(] and references therein. For example, in 
order to claim a finite, fractal correlation dimension, a 
scaling region of sufficient length has to be established. 
Determining this scaling region by eye or some algorithm 
may lead to an erroneous evidence of chaotic behavior. 
In order to evaluate the analysis, it has become popular 
to apply the method of surrogate data g. Therefore, 
data are generated which have the same linear statistical 
properties as the original data but not the possible non- 
linear ones. For many realizations of these data, the same 
algorithm as to the original data is applied. A significant 
difference between the distribution of the nonlinear fea- 
ture for the surrogate data and the original data is taken 
as an indication that the process underlying the origi- 
nal data is deterministic |10|, nonlinear |JTT|— |T^] or even 
chaotic @-GJ|. 

The explicit null hypotheses of surrogate data testing 
for linearity is that the data were generated by a lin- 
ear, stochastic, gaussian stationary process, including a 
possible invertible nonlinear observation function. Thus, 
a rejection of this hypothesis does not necessarily mean 
that the data come from a chaotic, i.e. some kind of sta- 
tionary, nonlinear deterministic, process. They might 
also originate from a nonlinear stochastic or even sim- 



ply from a linear, stochastic, non-stationary process. In 
this paper, we investigate the power of surrogate data 
testing against non-stationarity. As nonlinear feature we 
use the correlation dimension. The behavior of correla- 
tion dimension estimates has been investigated for the 
1// Q , a > 1 type of linear non-stationarity (jl7],[l8]]. For 
physiological data, such 1/ / behavior has been observed 
in heart rate p9| |. Often, physiological data are charac- 
terized by some kind of oscillatory behavior like EEG, 
hormone secretion, breathing or tremor. For such data, 
types of non-stationarity introducing some time depen- 
dency of the oscillating dynamics, e.g a modulation of 
frequency or amplitude, seems to be a natural violation 
of the null hypothesis. 

If the process is linear and the time dependency of 
the parameters, and thus, the autocovariance function 
is periodically in time, these processes are called cyclo- 
stationary |20|. Many other types of non-stationarity in 
oscillatory processes are imaginable. We choose cyclosta- 
tionary processes because they allow in simple way for a 
parametric violation of the null hypothesis. Formally, 
these processes can be expressed as higher dimensional 
autonomous non-linear stochastic processes. A special 
version of surrogate data testing acting on segments of 
the data has been suggested to analyze such data |Q . 

In the next section, we informally discuss the class of 
cyclostationary processes and introduce the two specific 
examples we use in Section III to investigate the power 
of surrogate data testing with respect to these types of 
non-stationarity. 



II. CYCLOSTATIONARY PROCESSES 

The parameters and a 2 of a linear stochastic autore- 
gressive (AR) process x(t) : 

p 

x{t) =Y^aix(t-p)+e(t), e(t) ~ A/"(0, cr 2 ) (1) 

determine the autocovariance function R(t) : 

R( T ) =< x (t)x(t + t) > . (2) 

The spectrum S(u>) is given as Fourier transform of the 
autocovariance function : 



(3) 



A possible first step to non-stationarity is to define a time 
dependent spectrum S(t,u>) and, correspondingly, a time 
dependent autocovariance function R(t,r): 
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R(t, t) =< x(i)x(t + r) > . (4) 

A cyclostationary process of periodicity L is defined by: 

R(t, t) = R(t + L, t) . (5) 

For the AR process of Eq. (Q) this means that the pa- 
rameters ai and a 1 may change periodically. 

As process satisfying the null hypothesis of surrogate 
data testing for linearity, we chose an autoregressive (AR) 
process of order two: 



x t = di^t-i + a 2 x t -2 + £t, e* ~ A/"(0, a 2 ) . 



(0) 



In terms of physics, AR processes can be interpreted 
as a combination of linear relaxators and linear damped 
oscillators driven by noise. For an AR process of order 
two which describes a damped oscillator, the parameters 
are related to the relaxation time r and period T by : 



ai = 2cos(2tt/T) exp(-l/r) 
Gi2 = — exp (— 2/t) 

The variance of the process Var(x t ) is given by 
Var(x t ) 



1 2 9 2a, (i2 

1 — ai — ai — -r J — 

1 2 1 — a2 



(7) 

(8) 



(9) 



We choose an AR2 process with T = 10, r = 50 and 
o" = 1 as process xo(t) that satisfies the null hypothesis. 
Fig. [l]a displays a realization of this process. 
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FIG. 1. Realizations of the processes investigated, (a) 
AR2 process satisfying the null hypothesis, (b) Amplitude 
modulated process with modulation depth of 0.3. (c) Period 
modulated process, relative amplitude of modulation is 15%. 

The oscillatory behavior with a mean period of 10 time 
steps is clearly visible as well as the natural variability 
of period and amplitude. Fig. (solid line) shows the es- 
timated spectrum of the process. The spectrum was es- 
timated by averaging 100 periodograms, i.e. the squared 
absolute value of the Fourier transform of the data. 
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FIG. 2. Estimated spectra of the processes shown in 
Fig. The spectra corresponding to Fig. |l|a and b are not dis- 
tinguishable (solid line). Period modulated process (dashed 
line) . 

A broad peak, typical for a stochastically driven lin- 
ear damped oscillator can be seen. Based on Eqs. (@,|]|]) 
we now introduce two parameterized violations of this 
stationary, linear, stochastic process in order to investi- 
gate the power of surrogate data testing with respect to 
non-stationarity. 

For the first violation of stationarity in the frame of 
cyclostationary processes, we choose a simple amplitude 
modulation, corresponding by Eq.^) to a periodicity of 
the variance of the driving noise. Based on the station- 
ary AR2 process xo (t) , the amplitude modulated process 
(t) is given by : 

(t) = (1 + Mod amp sin(27r/T mod *))a; (i) . (10) 

Mod amp , the modulation depth, parameterizes the viola- 
tion of the null hypothesis. T moc i determines the modu- 
lation period. Fig. ^> displays a realization of this pro- 
cess with T mo d — 250 and Mod amp = 0.3 for three peri- 
ods of the modulation. Compared to Fig. |l|a, the non- 
stationarity is hardly visible. Due to the long modulation 
period compared to the period of the process, its spec- 
trum is not distinguishable from that of the stationary 
process in Fig. ||. 

For the second violation of stationarity, we chose a 
modulation of the period T of the AR2 process with pe- 
riod T mo d and amplitude Modx around the mean period 
T m ean = 10. This leads to a time dependency of the 
parameter a\ of the AR2 process : 

T(t) = T mean + Mod T sin(27r/T mod t) (11) 
ai(t) = 2 cos (2tt/T(*)) exp(-l/r) . (12) 

Mod^ parameterizes the violation of the null hypothesis. 
According to Eq. ([)]), the time dependency of ax(t) causes 
a time dependency of the variance of the process. The 
effect of a changing variance is already covered by the 
first process, Eq. (|l0|). To investigate only the effect of 
a changing period of the process here, we use Eq. @ to 
adjust the variance c 2 (i) of the driving noise such that 
the variance of the process is constant : 



° 2 (t) = 



l-a\ 



2 a f a 2 



(13) 
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x (l- ai (t) 2 



2a 1 (t) 2 a 2 
1 - a 2 



(14) 



where ai and a 2 denote the parameters of the process 
Xo(t) satisfying the null hypothesis. Fig. ^jc displays a 
realization of this process with T mo d = 250 and Mody = 
1.5. Again, compared to Fig. [l]a, the non-stationarity is 
hardly visible. Fig. || (dashed line) shows the estimated 
spectrum of the process. The spectrum shows two peaks 
at the corresponding frequencies due to the specific type 
of modulation chosen. 



III. POWER OF THE TEST 

As nonlinear feature to investigate the power of surro- 
gate data testing against the two violations of stationar- 
ity we use the correlation dimension. The phase space is 
reconstructed by delay embedding. The delay is chosen 
equal to the lag at which the autocorrelation function 
first crosses zero. 

The correlation dimension D 2 is defined by: 



D 2 = lim 



d In C(r) 



1 — >0 d In r 

where C(r), the correlation integral, is given by: 



(15) 



N-/i N 

C(r) = const £ £ 6(r - \x(i) - x(j)\) , (16) 

i— 1 J—i^r^l 

including the Theilcr correction /1 |2^| which we chose 
equal to the mean period, i.e. 10 time steps. The canoni- 
cal procedure to establish a finite correlation dimension is 
to show the existence of a scaling region for small r where 
Eq. (|l5|) holds and stays constant for a high enough em- 
bedding dimensions. For all processes investigated here, 
the true correlation dimension is infinity. Following the 
idea of surrogate data testing, we fix an algorithm to ob- 
tain a finite value from the correlation integral and look 
for differences to the original data. Therefore, we apply 
Theiler and Lookman's "rule of five" chord estimator |2^] 
and chose their Rq equal to the standard deviation of the 
data. For such a large Rq we do not examine the small 
scale behavior of Eq. ( |l5| ) anymore. We are aware that 
we should not call this quantity correlation dimension 
anymore. It has been termed "dimensional complexity" 



The surrogate data are produced by the FFT algorithm 
||. For each degree of violation of the null hypothesis 
50 independent surrogate data sets of length 8192 were 
generated. Denoting the "correlation dimension" of the 
original data by /, the mean of the distribution of this 
feature for the surrogate data by /i stlrr and its the vari- 
ance by cr 2 urr , the result is displayed as : 



1/ - Ms 



It was confirmed that the distribution of the feature 
is sufficiently well described by a gaussian distribution. 
Thus, z can be related to a confidence interval, since for 
50 realizations the ^distribution of (/ — fi surr ) / f o ' SU rr is 
well approximated by a gaussian distribution and z = 
1.96 corresponds to the 5 % level of significance. 

In general, in power of the test investigations a proce- 
dure different from that outlined above is chosen. For a 
certain significance level, e.g. 5%, and different degrees 
of violation of the null hypothesis, numerous realizations, 
e.g. 1000, of the process are generated and the fraction 
of rejected null hypotheses is reported. Due to the high 
computational burden for calculating the correlation in- 
tegral, this procedure is not feasible here. The above 
procedure has the drawback that the results depend on 
the single realization that is used as basis for the surro- 
gates. We repeated the analysis reported below for inde- 
pendent realizations and found no qualitative differences 
for different realizations. 

For the first violation of the null hypothesis, we in- 
crease Mod amp in Eq. (|To|) from zero, i.e. no violation, to 
0.5 in steps of 0.1. The distribution of these data are not 
gaussian for Mod amp > 0. Thus, the amplitude adjusted 
surrogate data algorithm was applied. The devia- 
tion from gaussianity is weak for the range of violations 
chosen. We also applied the algorithm without ampli- 
tude adjustment and did not found significant different 
results. 

Fig. H displays the result of the simulation study. In 
dependence on the embedding dimension, z is displayed 
for different degrees of violation of the null hypothesis. 
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FIG. 3. Results of the simulation study for the amplitude 
modulated process. Shown is z in dependence on the em- 
bedding dimension for different degrees Mod amp of violation 
(O = 0,+ = 0.1, □ = 0.2, x = 0.3, A = 0.4 * = 0.5). 

As expected, without any violation, the z values stay 
within the 2a region given by z < 1.96. A modulation 
depth Mod am p of 0.1 and 0.2 leads to results at the border 



of 5% significance. Starting from Mod a 



0.3, see 



(17) 



Fig. [l]b, the null hypothesis is clearly rejected at the 5% 
level of significance whenever the embedding dimension 
is large enough to reconstruct the second order process 
appropriately. 

To investigates the effect of a variation in the period 



3 



of the linear stochastic process, we increase Mody in 
Eg. ( |TT| , p^|) from zero to three. The distribution of these 
data are gaussian independent from the value of Mod^. 
Thus, no amplitude adjustment was necessary. Again, 
the distribution of the feature is sufficiently well de- 
scribed by a gaussian distribution. Fig. ^ displays the 
result of the simulation study. 
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FIG. 4. Results of the simulation study for the period 
modulated process. Shown is z in dependence on the em- 
bedding dimension for different degrees Modr of violation ( 
O = 0, + = 1, □ = 1.5, x = 2, A = 3). 

For all degrees of violation, the violation is not detected 
when the embedding dimension is too small to unfold the 
dynamics in phase space. Otherwise, a modulation of the 
period of 15 %, see Fig. |l|c, leads to a clear rejection of 
the null hypothesis at the 5 % level of confidence. 
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IV. CONCLUSION 



The simulation studies reported in this paper indicate 
that surrogate data testing for linear, stochastic, gaussian 
stationary processes is powerful against a violation of the 
assumption of stationarity. Thus, a significant result of 
the test does not necessarily indicate a non-linear or even 
chaotic process underlying the data. It might have simply 
be caused by a non-stationarity of the process. 
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